Forward optimal modeling of acoustic confusions in Mandarin CALL system
نویسندگان
چکیده
Acoustic confusions degrade the accuracy of pronunciation assessment severely in Computer Assisted Language Learning (CALL) systems. This paper presents our recent study on optimal modeling of the acoustic confusions. We change the traditional mandarin syllable structure, which is composed of initial and final, to a novel phoneme structure. Several phoneme splitting strategies are investigated, and the question list used for building and merging decision tree is studied. The questions are special to each phoneme splitting strategy. Experiments show that the optimal phoneme splitting strategy outperforms the traditional initial-final structure in our CALL system, with relative 11.05% ASER improvement for nasal finals. This idea may be extended to improve the performance of automatic speech recognition (ASR).
منابع مشابه
Pronunciation Modeling for Spontaneous Mandarin Speech Recognition
Pronunciation variations in spontaneous speech can be classified into complete changes and partial changes. A complete change is the replacement of a canonical phoneme by another alternative phone, such as ‘b’ being pronounced as ‘p’. Partial changes are variations within the phoneme such as nasalization, centralization and voiced. Most current work in pronunciation modeling for spontaneous Man...
متن کاملPartial Change Accent Models Speech Recog
Regional accents in Mandarin speech result mostly from partial phone changes due to the interlanguage system of non-native speakers. We propose partial change accent models based on accent-specific units with acoustic model reconstruction for accented Mandarin speech recognition. We use phonological rules of dialectical pronunciations together with likelihood ratio test to model actual accented...
متن کاملProsodic modeling for improved speech recognition and understanding
The general goal of this thesis is to model the prosodic aspects of speech to improve humancomputer dialogue systems. Towards this goal, we investigate a variety of ways of utilizing prosodic information to enhance speech recognition and understanding performance, and address some issues and difficulties in modeling speech prosody during this process. We explore prosodic modeling in two languag...
متن کاملPerformance Analysis of cooperative SWIPT System: Intelligent Reflecting Surface versus Decode-and-Forward
In this paper, we explore the impacts of utilizing intelligent reflecting surfaces (IRS) in a power-splitting based simultaneous wireless information and power transfer (PS-SWIPT) system and compare its performance with the traditional decode and forward relaying system. To analyze a more practical system, it is also assumed that the receiving nodes are subject to decoding cost, and they are on...
متن کاملAutomatic phone set extension with confidence measure for spontaneous speech
Extending the phone set is one common approach for dealing with phonetic confusions in spontaneous speech. We propose using likelihood ratio test as a confidence measure for automatic phone set extension to model phonetic confusions. We first extend the standard phone set using dynamic programming (DP) alignment to cover all possible phonetic confusions in training data. Likelihood ratio test i...
متن کامل